Multi-Relational Model Tree Induction Tightly-Coupled with a Relational Database
نویسندگان
چکیده
Multi-Relational Data Mining (MRDM) refers to the process of discovering implicit, previously unknown and potentially useful information from data scattered in multiple tables of a relational database. Following the mainstream of MRDM research, we tackle the regression where the goal is to examine samples of past experience with known continuous answers (response) and generalize future cases through an inductive process. Mr-SMOTI, the solution we propose, resorts to the structural approach in order to recursively partition data stored into a tightly-coupled database and build a multi-relational model tree which captures the linear dependence between the response variable and one or more explanatory variables. The model tree is top-down induced by choosing, at each step, either to partition the training space or to introduce a regression variable in the linear models with the leaves. The tight-coupling with the database makes the knowledge on data structures (foreign keys) available free of charge to guide the search in the multi-relational pattern space. Experiments on artificial and real databases demonstrate that in general Mr-SMOTI outperforms both SMOTI and M5’ which are two propositional model tree induction systems, and TILDE-RT which is a state-of-art structural model tree induction system.
منابع مشابه
MR-SMOTI: A Data Mining System for Regression Tasks Tightly-Coupled with a Relational Database
Tight coupling of data mining and database systems is a key issue in inductive databases. It ensures scalability, direct and uniform access to both data and patterns stored in databases, as well as proper exploitation of information embedded in the database schema to drive the mining process. In this paper we present a new data mining system, named Mr-SMOTI, which is able to mine (multi-)relati...
متن کاملGeneralized Stochastic Tree Automata for Multi-relational Data Mining
This paper addresses the problem of learning a statistical distribution of data in a relational database. Data we want to focus on are represented with trees which are a quite natural way to represent structured information. These trees are used afterwards to infer a stochastic tree automaton, using a well-known grammatical inference algorithm. We propose two extensions of this algorithm: use o...
متن کاملTop-Down Induction of Relational Model Trees in Multi-instance Learning
(Multi-)relational regression consists of predicting continuous response of target objects called reference objects by taking into account interactions with other objects called task-relevant objects. In relational databases, reference objects and task-relevant objects are stored in distinct data relations. Interactions between objects are expressed by means of (many-to-one) foreign key constra...
متن کاملMulti-relational Decision Tree Induction
Discovering decision trees is an important set of techniques in KDD, both because of their simple interpretation and the efficiency of their discovery. One of their disadvantages is that they do not take the structure of the mining object into account. By going from the standard single-relation approach to the multi-relational approach as in ILP this disadvantage is removed. However, the straig...
متن کاملDeveloping Tightly-Coupled Data Mining Applications on a Relational Database System
We present a methodology for tightly coupling data mining applications to database systems to build high-performance applications, without requiring any change to the database software.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Fundam. Inform.
دوره 129 شماره
صفحات -
تاریخ انتشار 2014